feat(sparserl-sync): integrate SparseRL-Sync sparse diff tracking into optimizer by tonic-scitix · Pull Request #5 · scitix/Megatron-LM

tonic-scitix · 2026-05-15T06:29:22Z

feat(sparserl-sync): integrate SparseRL-Sync sparse diff tracking into optimizer

What does this PR do?

Instrument the Megatron optimizer's in-place weight-copy sites so the
SparseRL-Sync package can compute per-param sparse-update indices for each
rollout. When sparse_update is not installed the imports fall back to
nullcontext no-ops and the upstream behaviour is fully preserved.

This PR is one of three coordinated changes:

This PR — Megatron-LM: optimizer instrumentation
slime-trainer: feat(sparserl-sync): integrate SparseRL-Sync sparse weight-sync
scitix/sglang: feat(sparserl-sync): add sparse weight-sync path to ModelRunner

Modifications

`megatron/core/optimizer/distrib_optimizer.py`

Best-effort from sparse_update import init_sparse_manager, sparse_diff_context; falls back to nullcontext / no-op when not installed.
_build_model_and_main_param_groups: calls init_sparse_manager(model_param, shard_model_weight, shard_main_weight, param_range) for both fp16/bf16 and fp32 param groups so the SparseManager knows which DP-local buffer offset maps to which model param.
_copy_main_params_to_model_params: wraps shard_model_param.data.copy_(shard_main_param) with sparse_diff_context(shard_model_param, shard_main_param) so the manager snapshots pre/post state and emits per-param sparse indices.

`megatron/core/optimizer/cpu_offloading/hybrid_optimizer.py`

Best-effort from sparse_update import sparse_diff_context.
Wraps both copy-back hooks (param_copy_back_gpu_hook H2D copy and fp32_param_copy_back_gpu_hook fp32→fp16 copy) with sparse_diff_context(target, source).

`megatron/core/pipeline_parallel/p2p_communication.py`

Passes the group positional argument to all four torch.distributed.P2POp(...) constructors. It was previously omitted, which causes deadlocks when SparseRL-Sync's NCCL group overlaps with the world group during sparse weight-sync.

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code
I have added relevant documentation
I have run the autoformatter.sh on my PR

Notes for reviewers

All three import blocks are try/except ImportError; removing the
sparse_update package from the environment restores byte-identical
upstream behaviour with zero code changes.
init_sparse_manager is a one-time bind per shard slice; it does not
allocate any GPU memory itself.
sparse_diff_context is a lightweight CUDA-stream-safe context manager; it
does not add synchronisation points beyond the existing stream ordering.
The p2p_communication.py fix is independent of SparseRL-Sync and is safe
to cherry-pick separately.

…o optimizer Wrap the in-place param.copy_() calls in DistributedOptimizer and HybridDeviceOptimizer with sparse_diff_context() so the attached SparseManager can snapshot pre/post state and build per-param sparse-update indices for the Trainer→Rollout weight broadcast. Also pass the communication group to P2POp constructors in p2p_communication.py (group was omitted; required for non-default PG usage). Co-Authored-By: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sparserl-sync): integrate SparseRL-Sync sparse diff tracking into optimizer#5

feat(sparserl-sync): integrate SparseRL-Sync sparse diff tracking into optimizer#5
tonic-scitix wants to merge 1 commit into
scitix:sirl-devfrom
tonic-scitix:feat/sparserl-sync

tonic-scitix commented May 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonic-scitix commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(sparserl-sync): integrate SparseRL-Sync sparse diff tracking into optimizer

What does this PR do?

Modifications

megatron/core/optimizer/distrib_optimizer.py

megatron/core/optimizer/cpu_offloading/hybrid_optimizer.py

megatron/core/pipeline_parallel/p2p_communication.py

Pre-checks

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonic-scitix commented May 15, 2026 •

edited

Loading

`megatron/core/optimizer/distrib_optimizer.py`

`megatron/core/optimizer/cpu_offloading/hybrid_optimizer.py`

`megatron/core/pipeline_parallel/p2p_communication.py`